On lemmatization in Arabic,

نویسنده

  • Joseph Dichy
چکیده

This work is a ‘prospective extension’ of the lexical work achieved in the DIINAR-MBC Euro-Mediterranean project. It aims at contributing to the crucial issue in the field of Arabic NLP of the operations involved in lemmatization, which are necessarily based on a definition of the Arabic entries of a monolingual or multilingual lexical database. As shown in previous work, lexical entries can be differentiated on the basis of morphosyntactic features, which are, to a large extent, related to meaning. These features are considered in the light of the relations between grammar and lexicon. Formal lemmatization procedures will be proposed, as a result of an explicit definition of lexical entries in Arabic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Build Fast and Accurate Lemmatization for Arabic

In this paper we describe the complexity of building a lemmatizer for Arabic which has a rich and complex derivational morphology, and we discuss the need for a fast and accurate lammatization to enhance Arabic Information Retrieval (IR) results. We also introduce a new data set that can be used to test lemmatization accuracy, and an efficient lemmatization algorithm that outperforms state-of-t...

متن کامل

Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking

We investigate the tasks of general morphological tagging, diacritization, and lemmatization for Arabic. We show that for all tasks we consider, both modeling the lexeme explicitly, and retuning the weights of individual classifiers for the specific task, improve the performance.

متن کامل

The Power of Language Music: Arabic Lemmatization through Patterns

The interaction between roots and patterns in Arabic has intrigued lexicographers and morphologists for centuries. While roots provide the consonantal building blocks, patterns provide the syllabic vocalic moulds. While roots provide abstract semantic classes, patterns realize these classes in specific instances. In this way both roots and patterns are indispensable for understanding the deriva...

متن کامل

SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media

In this work, we present SAMAR, a system for Subjectivity and Sentiment Analysis (SSA) for Arabic social media genres. We investigate: how to best represent lexical information; whether standard features are useful; how to treat Arabic dialects; and, whether genre specific features have a measurable impact on performance. Our results suggest that we need individualized solutions for each domain...

متن کامل

Morphological Analysis and Disambiguation for Dialectal Arabic

The many differences between Dialectal Arabic and Modern Standard Arabic (MSA) pose a challenge to the majority of Arabic natural language processing tools, which are designed for MSA. In this paper, we retarget an existing state-of-the-art MSA morphological tagger to Egyptian Arabic (ARZ). Our evaluation demonstrates that our ARZ morphology tagger outperforms its MSA variant on ARZ input in te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001